Hybrid approaches for automatic vowelization of Arabic texts

نویسندگان

  • Mohamed Bebah
  • Amine Chennoufi
  • Azzeddine Mazroui
  • Lakhouaja Abdelhak
چکیده

Hybrid approaches for automatic vowelization of Arabic texts are presented in this article. The process is made up of two modules. In the first one, a morphological analysis of the text words is performed using the open source morphological Analyzer AlKhalil Morpho Sys. Outputs for each word analyzed out of context, are its different possible vowelizations. The integration of this Analyzer in our vowelization system required the addition of a lexical database containing the most frequent words in Arabic language. Using a statistical approach based on two hidden Markov models (HMM), the second module aims to eliminate the ambiguities. Indeed, for the first HMM, the unvowelized Arabic words are the observed states and the vowelized words are the hidden states. The observed states of the second HMM are identical to those of the first, but the hidden states are the lists of possible diacritics of the word without its Arabic letters. Our system uses Viterbi algorithm to select the optimal path among the solutions proposed by Al Khalil Morpho Sys. Our approach opens an important way to improve the performance of automatic vowelization of Arabic texts for other uses in automatic natural language processing.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Off-line Arabic Handwritten Recognition Using a Novel Hybrid HMM-DNN Model

In order to facilitate the entry of data into the computer and its digitalization, automatic recognition of printed texts and manuscripts is one of the considerable aid to many applications. Research on automatic document recognition started decades ago with the recognition of isolated digits and letters, and today, due to advancements in machine learning methods, efforts are being made to iden...

متن کامل

The Role of Morphology and Short Vowelization in Reading Morphological Complex Words in Arabic: Evidence for the Domination of the Morpheme/Root-Based Theory in Reading Arabic

This study investigated the reading accuracy of 59 adult highly skilled native Arabic readers in reading morphological complex Arabic words in 6 reading conditions: Isolated words with short vowelization, isolated words without short vowelization, sentences with roots with short vowelization, sentences with roots without short vowelization, sentences without priming roots with short vowelizatio...

متن کامل

An Automatic Punctuation Marks System For Arabic Texts

This work presents a system for Automatic Arabic punctuation marks. Existing approaches for automatic punctuation marks do not provide suitable performance for and do not satisfy user interests in Arabic texts. The importance and rising need to automate the correct insertion of punctuation marks in Arabic texts led to a need of specific analysis of the Arabic language to introduce approaches th...

متن کامل

Using Machine Learning Algorithms for Automatic Cyber Bullying Detection in Arabic Social Media

Social media allows people interact to express their thoughts or feelings about different subjects. However, some of users may write offensive twits to other via social media which known as cyber bullying. Successful prevention depends on automatically detecting malicious messages. Automatic detection of bullying in the text of social media by analyzing the text "twits" via one of the machine l...

متن کامل

Statistical vowelization of Arabic text for speech synthesis in speech-to-speech translation systems

Vowelization presents a principle difficulty in building text-tospeech synthesizers for speech-to-speech translation systems. In this paper, a novel log-linear modeling method is proposed that takes into account vowel and diacritical information at both the word level and character level. A unique syllable based normalization algorithm is then introduced to enhance both word coverage and data c...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1410.2646  شماره 

صفحات  -

تاریخ انتشار 2014